Recalibration of Gaussian Neural Networks regression models:

the calalibratiNN package

Carolina Musso

Prof: Guilherme Rodrigues Departament of Statistics/UnB

Introduction: NN today

GOODFELLOW et al. (2016)

GOODFELLOW et al. (2016)

  • Dropout (2013)
  • Adam optimization algotithm (2014)
  • Batch Normalization (2015)

Introduction: Problems

  • Low error is not the only asset a NN (or models in general) should have.

  • It should be able to quantify its uncertainty.

  • Neural networks can be constructed in a way that yields probabilistic results:

    • Optimized by the log-likelihood
  • However, like any model , it can be miscalibrated.

Introduction: Calibration

  • A 95% Confidence Interval should contain 95% of the true output.

Formally:

\[\mathbb{P}(Y \leq \hat{F_Y}^{-1}(p))= p , \forall ~ p \in [0,1]\]

  • Nowadays, NN are not so well calibrated.

Observing miscalibration

Consider a synthetic data set \((x_i, y_i), i \in (1, ..., n)\) generated by an heteroscedastic non-linear model:

\[ x_i \sim Uniform(1,10)\\ \]

\[ y_i|x_i \sim Normal(\mu = f_1(x_i), \sigma= f_2(x_i)) \\ f_1(x) = 5x^2 + 10 ~; ~ f_2(x) = 30x \]

And the fitted model,

\[ \hat{y}_i = \beta_0 + \beta_1 x_i +\epsilon_i, ~\epsilon_i ~ iid \sim N(0,\sigma) \]

Observing miscalibration

Global Coverage: 94.45%.

Observing miscalibration

Synthetic data set \(({x_1}_i, {x_2}_i, y_i) , \ i \in (1, ..., n)\):

\[{x_1}_i \sim Unifom(-4,4) \ ; \ {x_2}_i \sim Unifom(-6,6)\] \[ y_i|x_i \sim Normal(\mu=g({x_1}_i, {x_2}_i),\ \sigma=1 )\\ g(x_1, x_2) = x_1 ^3 - 30sin(x_2) + 30 \] And the fitted model

\[ \hat{y}_i = \beta_0 + \beta_1 {x_1}_i + \beta_2 {x_2}_i + \epsilon, ~\epsilon~iid \sim N(0,\sigma)\]

Observing miscalibration

Global Coverage: 94.2%.

  • What about higher dimensions?

PIT - Values

  • Histogram of Probability Integral Transform (PIT) values.

  • Let \(F_Y(y)\) be the CDF of a continuous random variable Y, then:

\[U = F_Y (Y ) ∼ Uniform(0, 1)\]

  • In particular, if \(Y \sim Normal(\mu, \sigma)\):

\[Y = F_Y^{-1} (U) ∼ Normal(\mu, \sigma)\]

Lets return to the first model

\[ x_i \sim Uniform(1,10)\\ \]

\[ y_i|x_i \sim Normal(\mu = f_1(x_i), \sigma= f_2(x_i)) \\ f_1(x) = 5x^2 + 10 ~; ~ f_2(x) = 30x \]

And the fitted model,

\[ \hat{y}_i = \beta_0 + \beta_1 x_i +\epsilon_i, ~\epsilon_i ~ iid \sim N(0,\sigma) \]

PIT-Values

Recalibration

  • There are techniques such as:

    • TORRES (2023)

    • KULESHOV; FENNER; ERMON (2018)

    • KULESHOV; DESHPANDE (2021)

  • However, it can be hard for the user to learn and implement these methods.

  • In this sense, packages may be a useful way to share these routines.

Available Packages

  • R: probably

  • Python: ml_insights

  • Only global, focused on classification problems, and only appliable in the covariate space.

Objective:

  • Develop an R package with a recalibration method for regression and that can be applied locally and in any layer of a Neural Network, in order to make recalibration practices easier.

Methods: R package

  • Optimize and wrap the code in user-friendly functions:

    • Conventions WICKHAM; BRYAN (2015)
  • IDE RStudio, documentation with the package roxygen2 and vignettes/paper in .Rmd.

  • Package consists of two fundamental components: model diagnostics and model recalibration.

Methods: Recalibration method

  • Method:

    • Torres (2023): Calibration across various representations of the covariate space: useful for Artificial Neural Networks (ANNs).

Results

  • Package available in this GitHub repository.

  • The package can be installed and loaded using the following code:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_current_gh("cmusso86/recalibratiNN")

Results

  • 7 functions & 10 dependencies: stats, 6 from the tidyverse , RANN (Approximate KNN), Hmisc (weighted variance), and glue (Customization of plots).

  • Gaussian models: linear models adjusted by OLS, ANN trained using the least squares MSE loss function.

  • Calibration across various representations of the covariate space.

PIT-Values: Global

  • Recalibration set

  • Gaussian models: \(F^{-1}_{norm}(Y)\).

  • \(y_{cal}\), \(\hat{y}_{cal}\), RMSE.

  • qnorm( ycal, y_hat_cal, rsme) or PIT_global(y_cal, y_hat_cal, sme)

  • From the linear model presented earlier: . . .

pit <- PIT_global(ycal = y_cal, # true values from calib. set.
                  yhat = y_hat_cal, # predictions for calb. set. 
                  mse  = MSE_cal) # MSE from calibration set. 

head(pit, 5) # observe the first values.
[1] 0.04551257 0.42522358 0.81439164 0.69119416 0.44043239

PIT-values: local

pit_local <- PIT_local(xcal = x_cal, 
                       ycal = y_cal, 
                       yhat = y_hat_cal, 
                       mse = MSE_cal,
                       clusters = 6,
                       p_neighbours = 0.2,
                       PIT = PIT_global)
head(pit_local, 5) 
# A tibble: 5 × 5
  part    y_cal y_hat   pit     n
  <glue>  <dbl> <dbl> <dbl> <dbl>
1 part_1 -27.9  -8.12 0.456   400
2 part_1   8.63 -8.22 0.538   400
3 part_1  67.4  -7.58 0.663   400
4 part_1   1.58 -7.44 0.520   400
5 part_1  18.4  -8.47 0.560   400

Observing miscalibration

Global

gg_PIT_global(pit,
              type = "density",
              fill = "steelblue4",
              alpha = 0.8,
              print_p = TRUE)

Observing miscalibration

Global

gg_CD_global(pit,
             y_cal, 
             y_hat_cal, 
             MSE_cal) 

Observing miscalibration

Local

gg_PIT_local(pit_local, 
             alpha = 0,
             linewidth = 1,
             pal = "Set2",
             facet = F)

Observing miscalibration

gg_PIT_local(pit_local, alpha = 1,pal = "Greens", facet = T)

Observing miscalibration

Local

gg_CD_local(
  pit_local,
  psz = 0.01,
  abline = "red",
  pal = "PiYG",
  facet = T)

Recalibration

Global

# new data 
x_new <- runif(n/5, 1, 10)
y_hat_new <- predict(model,
                     data.frame(
                       x_train=x_new)
                     )

# recalibration
rec_global <- recalibrate(yhat_new=y_hat_new,
                   pit_values=pit,
                   mse=MSE_cal,
                   type="global")

names(rec_global)
[1] "y_hat_calibrated"     "y_var_calibrated"     "y_samples_calibrated"

Recalibration

Local

# recalibration
rec_loc <- recalibrate(yhat_new = y_hat_new,
                   space_new = x_new,
                   space_cal = x_cal,
                   pit_values = pit,
                   mse = MSE_cal,
                   type = "local",
                   p_neighbours = 0.2, 
                   epsilon = 0.1)

names(rec_loc)
[1] "y_hat_calibrated"         "y_var_calibrated"        
[3] "y_samples_calibrated_wt"  "y_samples_calibrated_raw"
[5] "y_kernel"                

Checking in Test Set: Global calibration

Empirical pit distribution.

Checking in Test Set: Local calibration

Empirical pit distribution.

Checking in Test Set

Scatter plot and true mean.

Neural Networks

Neural Network example

library(keras)

model <- keras_model_sequential() # simple MLP

model %>%
  layer_dense(input_shape = 1, # one covariate
              units = 200,     # 200 neuron
              activation = "sigmoid") %>% # non-linearity
  layer_dense(units = 1, # output
              activation = "linear") # regression

# compiling model
model %>%
  compile(optimizer = optimizer_adam(
    learning_rate = 0.01),
    loss="mse") # normal distribution

# fitting model
model %>%
  fit(x = x_train, y = y_train,
      validation_data = list(x_cal, y_cal),
      callbacks = callback_early_stopping(
        monitor = "val_loss",
        patience = 20,
        restore_best_weights = T),
      batch_size = split*n, # batch size
      epochs = 500)

# predicting output values
y_hat_cal <- predict(model, x_cal) # predictions cal
y_hat_new <- predict(model, x_new) # predictions new


# predicition intermediate layer
layer_name <- paste('dense', 1, sep = '_')
layer_model <- keras_model(
  inputs = model$input,
  outputs = get_layer(model, layer_name)$output
)

h_new <- layer_model %>% predict(x_new)
h_cal <- layer_model %>% predict(x_cal)


# MSE
MSE_cal <-model %>%
  evaluate(x_cal, y_cal)
MSE_cal <-as.numeric(MSE_cal)

NN calibration

Recalibration

pit <- PIT_global(y_cal, y_hat_cal_nn,  MSE_cal_nn)

recal_NN_h <- recalibrate(
 yhat_new = y_hat_new_nn,
 pit_values = pit, 
 mse = MSE_cal_nn,
 space_cal = h_cal, # matrix of representations
 space_new = h_new, # matrix of representations
 type="local"
)
  • In this example it is not particularly interesting.

Evaluation on Test Set

Conclusions and Future Work

  • Effective Visualization of Miscalibration.
  • Accurate Implementation of Torres Method.
    • Torres method improves calibration, especially with
  • Advantages related to other packages
    • Focused in regression models
    • Local recalibration
    • Recalibration at intermediate layers.
  • It is available for prompt use on GitHub and can be installed directly from the console.

Future Developments:

  • Seamless integration with other packages, broader input types, and compatibility with various cross-validation methods.
  • Handle models with arbitrary predictive distributions.

References

KULESHOV, V.; DESHPANDE, S. Calibrated and sharp uncertainties in deep learning via density estimation. International conference on machine learning. Anais...2021.
KULESHOV, V.; FENNER, N.; ERMON, S. Accurate uncertainties for deep learning using calibrated regression. International conference on machine learning. Anais...PMLR, 2018.
TORRES, R. Quantile-based Recalibration of Artificial Neural Networks. Master’s thesis—Distrito Federal, Brazil: University of Brasília, 2023.
WICKHAM, H.; BRYAN, J. R packages: Organize, test, document, and share your code. [s.l.] O’Reilly Media, Inc., 2015.